The tree based linear regression model for hierarchical categorical variables

نویسندگان

چکیده

Many real-life applications consider nominal categorical predictor variables that have a hierarchical structure, e.g. economic activity data in Official Statistics. In this paper, we focus on linear regression models built the presence of type variables, and study consolidation their categories to better tradeoff between interpretability fit model data. We propose so-called Tree based Linear Regression (TLR) optimizes both accuracy reduced its complexity, measured as cost function level granularity representation variables. show finding non-dominated outcomes for problem boils down solving Mixed Integer Convex Quadratic Problems with Constraints, small medium size instances can be tackled using off-the-shelf solvers. illustrate our approach two real-world datasets, well synthetic one, where methodology finds much less complex very mild worsening accuracy. • predictors. aim consolidate information, trading off complexity. An optimization obtain such lineal is solved.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Role of Categorical Variables in Multicollinearity in the Linear Regression Model

The present article discusses the role of categorical variable in the problem of multicollinearity in linear regression model. It exposes the diagnostic tool condition number to linear regression models with categorical explanatory variables and analyzes how the dummy variables and choice of reference category can affect the degree of multicollinearity. Such an effect is analyzed analytically a...

متن کامل

Linear regression model with histogram-valued variables

Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogram or by a quantile function. Linear regression models for this type of data are necessarily more complex than a simple generalization of the classical model: the parameters cannot be negative; still th...

متن کامل

A multiple linear regression model for LR fuzzy random variables

In standard regression analysis the relationship between one (response) variable and a set of (explanatory) variables is investigated. In a classical framework the response is affected by probabilistic uncertainty (randomness) and, thus, treated as a random variable. However, the data can also be subjected to other kinds of uncertainty, such as imprecision. A possible way to manage all of these...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Expert Systems With Applications

سال: 2022

ISSN: ['1873-6793', '0957-4174']

DOI: https://doi.org/10.1016/j.eswa.2022.117423